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Abstract —We study the problem of constructing secure regen¬ 
erating codes that protect data integrity in distributed storage 
systems (DSS) in which some nodes may be compromised by a 
malicious adversary. The adversary can corrupt the data stored 
on and transmitted by the nodes under its control. The “damage” 
incurred by the actions of the adversary depends on how much 
information it knows about the data in the whole DSS. We focus 
on the limited-knowledge model in which the adversary knows 
only the data on the nodes under its control. The only secure 
capacity-achieving codes known in the literature for this model 
are for the bandwidth-limited regime and repair degree d = n—1, 
i.e., when a node fails in a DSS with n nodes all the remaining 
n—1 nodes are contacted for repair. We extend these results 
to the more general case of d < n — 1 in the bandwidth-limited 
regime. Our capacity-achieving scheme is based on the use of 
product-matrix codes with special hashing functions and allow 
the identification of the compromised nodes and their elimination 
from the DSS while preserving the data integrity. 

Index Terms —Distributed storage, regenerating codes, infor¬ 
mation theoretic security, malicious adversary 

I. Introduction 

We consider the problem of securing data in distributed 
storage systems (DSS) under failure and repair (rebuilding) dy¬ 
namics against a malicious adversary that can control a certain 
number of nodes in the system. DSS experience frequent node 
failures due to the use of inexpensive commodity hardware 
ID, Gl. Data redundancy is used to prevent from data loss. 
Typically, replication codes are used and multiple copies of 
the data, typically 3, are stored in the DSS. Recently, major 
cloud storage companies a, 11 have started using erasure 
codes, such as regenerating codes a and locally repairable 
codes a, to achieve data reliability with a lower storage cost 
and better tradeoffs with other system resources, such as repair 
bandwidth and data locality. 

We assume that there is an adversary that can corrupt the 
data stored on the nodes under its control and all of their 
outgoing messages. The damage incurred by the adversary 
depends on how much information it knows about the stored 
file. We focus on the limited-knowledge model H, 13 in 
which the only information the adversary knows about the 
stored data comes from reading the data on the nodes under 
its control. This is in contrast with an omniscient adversary that 
although it controls a limited number of nodes, it has complete 
knowledge of all the data in the DSS. Due to the distributed 
nature of DSS, a limited-knowledge adversary may be a more 
suitable model in many applications. 


Classical codes, such as Reed-Solomon codes, can be used 
to correct errors. However, they may not be well suited to 
the secure distributed storage problem under consideration, for 
two main reasons: (1) they are designed for an omniscient 
adversary or random errors, (2) they result in high repair 
bandwidth. Take for instance a replication code that stores 
4 copies of a file on 4 different storage nodes, one of which 
is controlled by an adversary. If one of the other nodes fails 
and is repaired by contacting the remaining 3 nodes, it has to 
download all their data in order to do majority decoding and 
regenerate an uncorrupted copy of the file. This corresponds to 
a repair bandwidth equal to 3 times the size of the stored file. 
To address these challenges, we construct regenerating codes 
that can secure the data and enable an efficient repair process. 

Our objective is to achieve information theoretic security 
of the data that guarantees security even if the adversary has 
unlimited computation power. Computational security can be 
possibly achieved by storing on a trusted server cryptographic 
hash functions (SHA-2, etc.) of the data on each node. 
Then, the hashes of the downloaded data can be computed 
and compared to the trusted hashes to determine possible 
corruption. This would work given the empirical assumption 
that it is computationally hard for the adversary to create a 
hash collision. Here, we do not make this assumption and show 
that there is no extra cost for achieving information theoretic 
security (as opposed to the omniscient case). Applications may 
include long term storage HI, as hash functions that are hard 
to break now are more likely to be broken in the future. 

Related work: The problem of studying the information 
theoretic security of DSS with repair dynamics was first 
studied in ii, m, M, where different adversarial models 
were considered: passive (eavesdropper), active (omniscient 
vs. limited-knowledge). Further results on achieving security 
in the DSS against eavesdropping appeared in HD-IHl 
omniscient adversary in ifTSll . ifThll . Schemes for protecting data 
in DSS against random errors were studied in 1T3 . IHl. 

Contribution: An upper bound on the secure capacity 
of the DSS, also called resiliency capacity, for the limited- 
knowledge model was proved in Q (see Eq. O), as well 
as a scheme that achieves this upper bound in the bandwidth- 
limited regime. This scheme is restricted to a repair degree d = 
n — 1, i.e., all remaining nodes are contacted to repair a failed 
node. In this paper, we prove that the upper bound in ||6l is 
achievable for any possible d < n — 1 in the bandwidth-limited 


regime provided that the number of compromised nodes is 
below a certain limit. Thus, we characterize the secure capacity 
of the DSS for the limited-knowledge adversary for the regime 
we consider. Our proof is constructive and is based on using 
the hashing schemes of ||6| with an alternative representation 
of Product-Matrix codes fT^. Compared to the scheme of jhl, 
our scheme not only allows secure data reconstruction by the 
user, but also allows the regeneration of an uncorrupted exact 
copy of the lost data during repair from a failure. 

Organization: The paper is organized as follows. In 
Sec. U we describe the system and adversary model. In 
Sec.lml we state our main result. We follow by first describing 
our secure code construction in Sec.|IV]and an example on the 
decoding algorithm in Sec.|V] The proof of the main result is 
given in Sec. |VT] 

II. System Model 

An {n,k,d) DSS is formed of n unreliable nodes 
{1,2,..., n}, each with a storage capacity equal to a symbols. 
A DSS allows any legitimate user to reconstruct its stored 
file by connecting to any k out of the n nodes. Thus, it can 
tolerate n — k simultaneous failures. We assume that the file 
symbols are the realizations of iid random variables uniformly 
distributed over a finite field. 

When a node fails, the DSS is repaired by replacing the 
failed node with a new one. The new node connects to d,k < 
d < n — 1, helper nodes, chosen out of the remaining n — 1 
active ones, and downloads /? symbols from each. These /3 
symbols are processed and the result is stored on the new node. 
So, we have a < djd. The literature distinguishes between two 
types of repair: exact repair where the regenerated data on 
the new node is an exact copy of the lost data, and functional 
repair where it is functionally equivalent in the sense that it 
retains the codes properties of file reconstruction and repair. 
Functional repair is more tractable for theoretical analysis, 
whereas practical systems typically require exact repair. 

Adversary model: Our objective is to guarantee data relia¬ 
bility and integrity even when b nodes in the DSS are com¬ 
promised by an adversary. We focus on a limited-knowledge 
adversary that controls b nodes and can read and corrupt all 
their stored data and/or their messages sent to other nodes 
during repair or to the users during file reconstruction. The 
only information the limited-adversary has about the stored 
data comes from the data on the nodes under its control. This 
is in contrast with an omniscient adversary that knows the 
data on all the nodes but controls only few nodes. We assume 
that the adversary has complete knowledge of the storage and 
repair schemes implemented in the DSS. 

Trusted server: In addition to the the previous setting, there 
is a special node, referred to as a trusted servefl that can 
never be compromised by an adversary. The trusted server 
is considered to be an expensive resource and has a limited 
storage capacity. So, it cannot be used to store the data in the 
DSS. It will be used to store hashes of the data. 

* In case the trusted server does not exist, it can be emulated using the 
nodes in the DSS using the technique in (6), Q 


III. Main result 

The secure capacity Cs of a DSS against an adversary was 
defined in 0, nni to be the maximum amount of information 
that a user contacting k nodes can always decode with an 
arbitrarily small probability of error and after any number of 
(functional) repairs. The following upper bound was proven 
in 0 in the case of a limited-knowledge adversary, 

k 

Cs < ^ min (a, (d — *-f l)/3} . (1) 

i=b-\-l 

Setting 6 = 0, i.e., when there is no adversary, we recover 
part of the original result in 0 on the storage vs. repair 
bandwidth tradeoff for a file of size B, 

k 

B < min {a, {d — i 1)/3} . (2) 

Each term in the summation in Eq. ^ accounts for the 
contribution of each of the k nodes which is at most a 
symbols, but can be less due to correlation between the data 
on the different nodes. Therefore, an intuitive justification of 
Eq. [T] is that one strategy of the adversary is to always delete 
the data on the nodes under its control (say always change it 
the all zero sequence). This results in the loss of the data on 
6 nodes, which in the worst case, can be among the k nodes 
contacted by the user. It is worth mentioning that the effect of 
an omniscient adversary is the loss of 26 nodes from the DSS 
and the following upper bound was the shown in 0, nni, 
Cs < J2i=2b-vi 't®’ {d — i 1)/3} which can be regarded 
as a generalization of the Singleton bound. This bound was 
shown to be achievable in certain regimes 0, 00, 

The only achievability result for the upper bound in Eq. [T]is 
known for d = n — 1 in the bandwidth-limited regime. In this 
regime, the only restriction is on the repair bandwidth /3 and 
there is no restriction on the storage capacity of a node a, in 
particular a> djd (the new node can store all the downloaded 
data). Our main results stated in Th.[T] asserts the achievability 
of this bound in the bandwidth-limited regime (including the 
the so-called minimum-bandwidth regime (MBR)) for any 
possible repair degree d as long as 6 < k/2. 

Theorem 1: The secure capacity of an {n,k,d) DSS with 
exact repair in the bandwidth limited regime is given by 

k 

a = ^ (d-^ + l)/3, (3) 

when 6 < k/2 nodes are compromised by a limited 

knowledge - ad vers ary. 

The proof of Th. [T]is constructive and the rest of the paper 
will be dedicated to describing the scheme that achieves the 
secure capacity. 

IV. Secure Code Construction 

Our construction is based on the Product-Matrix (PM) codes 
introduced by Rashmi et al. in HD and the correlation hashing 
scheme in 0. We introduce a representation of PM codes that 


is equivalent to the original codes in fl^ in the sense that 
one can be obtained as an invertible linear transformation of 
the other. This representation reflects the intuition behind our 
secure scheme and simplifies the proof. 

Original MBR PM code: We consider an (ji,k,d) DSS. 
WLOG, we assume the repair bandwidth per link /3 is normal¬ 
ized to 1 (/3 = 1). We choose a = dp = d, which corresponds 
to the MBR regime. We will explain the construction with 
an example by taking {n,k,d) = (7,3,4) and following 
the notation in ||T91 . The maximum file size that can be 
stored is i? = 9 (see Eq. (|2]l). The encoding matrix is 
an n X d Vandermonde matrix T' with elements in F,. We 
denote the row of T* by ipi. The stored file is denoted by 
U = {Ui, ...,Ub), where the symbols Ui € Fg^ are packets of 
length V. The message symbols are arranged in an d x d matrix 


M = 


S 

pt 


T 

0 


, where, S is a k x k symmetric matrix and 


T is a fc X (d — fc) matrix and 0 is a (d — fc) x (d — fc) zero 
matrix. For our example, 


Ar. Therefore, the exact repair and file reconstruction properties 
are directly inherited from the original PM codes. 

For each node i, any of the symbols Xij’s, can be computed 
from any a = 4 other symbols (follows directly from T* 
being a Vandermonde matrix). Therefore, if node 1 is being 
repaired by contacting helper nodes 2,3,4, 5, then it down¬ 
loads Xi 2 ,-Vi 3 , Xi 4 , Xi 5 from each respectively. However, if 
it contacts say 2 ,3,4, 6 , then it downloads X 12 , X 13 , X 14 , Xiq 
from each respectively. Then, computes X 15 using the down¬ 
loaded data and stores -V 12 ,X 13 ,X 14 , X 15 . 

The hashes: The correlation hashing scheme introduced in 
||6l will be used to cross-check the data on the different 
nodes against each other and will allow to detect the com¬ 
promised nodes (with high probability) when used with PM 
codes. We abuse notation and write Xij as a vector in F", 
Xij = {Xlj,Xfj, ..., X" ), where X^j G ¥q. The hash of two 
distinct symbols Xij and Xik is defined as the dot product 
Xij.Xfi. — jy^s=i ^ij^ik ^ ^' 9 - ^11 these hashes are stored 
on the trusted server and cannot be corrupted by the adversary. 
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where is defined over Fn. Node i stores the coded vector 
Wi = 'piM. For example on node 1 we store ipiM = 
[1 1 1 1 ] M. For details on repair and file reconstruc¬ 

tion we refer the reader to OH. 

Modified PM code representation: 

We define Xij G F^t,, i,j < n, i j, to be the symbol 
sent by node i to node j when node j is being repaired. 
From 03, we know that Xij = ipiMipj (the superscript t 
denotes the transpose operation). Note that Xij = Xji because 
M is symmetric. On each node i, we store any a = 4 
different symbols among Xu ,..., j+i,..., X^. 

WFOG, we store the first a = 4 symbols as described in 
Table H 


node 1 

X12 

Xl3 

Xl4 

Xis 

{X16, X17} 

node 2 

X21 

X23 

X24 

X25 

{X26, ^27} 

node 3 

X31 

X32 

X34 

X35 

{X36, X37} 

node 4 

X41 

X42 

X43 

X45 

{X46,X47} 

node 5 

X51 

X32 

X53 

X54 

{X56, X57I 

node 6 

Xei 

Xe2 

X63 

Xe4 

1 X 65 ,^ 67 } 

node 7 

Xn 

X72 

X73 

X74 

{X75,X7e} 


TABLE I 

Symbols stored on each node in the modified PM code representation. 

The symbols between braces ({}) are not stored but can be computed 

USING THE STORED SYMBOLS. 

Here, the data stored on node i, say W/, can be written 
as Wl = WiL, with L being an invertible matrix, hence the 
equivalence of the two representations. For instance, W[ = 
tfl -ipl ifl] = WiL, where L = [ipl ^3 ^4 ^ 5 ] is 
invertible because it is a submatrix of the Vandermonde matrix 


V. Secure Decoding Algorithm: Example 

Here, we describe how the hashing scheme in m can be 
used with the PM codes to achieve security. We describe the 
secure repair and file reconstruction scheme through an exam¬ 
ple. Consider the problem of securing an (n, k, d) = (7,4, 5) 
DSS against an adversary controlling 6 = 1 node(s). We 
use the (n,k' = k — b,d' = d — b) = (7,3,4) PM code 
described in Sec. |IV] Note that, k = A and d = 5 nodes will 
still be contacted during file reconstruction and node repair 
respectively, i.e., 6=1 extra node(s) more than what the 
actual PM code is designed for is contacted. The idea is to 
use the hashes to identify the compromised node, then ignore 
its data by treating it as an erasure. The back-off of 6 in 
k and d will allow to repair and reconstruct the data even 
in the presence of the erasure. Therefore, we can guarantee 
the security of the data in the system and can detect and 
report the compromised nodes. In the following discussion, 
we focus on the detection of the corrupted nodes. Notice that 
the limited-knowledge adversary controlling 6 = 1 node(s), 
say node 3, can observe the data stored on this node and only 
6=1 symbol(s) on each other node. This happens because 
Xsi = Xi 3 for alH = 1,..., n. 

A. Secure repair 

Assume WLOG that in Table U node 3 is compromised and 
that node 2 fails. The new node contacts d = 5 nodes, say 
nodes 1,3,4, 5 and 6 for repair. Each helper node h sends 
Xh 2 = X 2 h to the new node 2. The new node computes the 
correlation hashes of the downloaded data and compares it to 
the corresponding hashes downloaded from the trusted server. 
Then, it constructs the symmetric comparison table similar to 
the example shown in table |II] 

The repaired node looks for a column with 2 x’s or more to 
identify the compromised node. The reason is that a column 
with a single x, say in the entry corresponding to X 12 and 
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TABLE II 

A COMPARISON OF THE COMPUTED HASHES OF THE DOWNLOADED DATA VS. THE 
DOWNLOADED HASHES FROM THE TRUSTED SERVER. A BLANK INDICATES THAT THE 
PRODUCT IS NOT COMPUTED, A •/ INDICATES THAT THE HASHES ARE EQUAL, WHILE 
AN X INDICATES THAT THE HASHES DO NOT MATCH. 

X 32 , is confusing and can mean either node 2 or node 3 are 
compromised. 

The example in Table HIl assumes the ideal case in which all 
the corrupted symbols result in a hash mismatch. However, the 
adversary can always attempt to corrupt the data in a way to 
match the downloaded hashes, i.e, create a hash collision. For 
example, it will be able to fool the new node if it manages to 
change at least 3 x’s into /’s in the column of X 32 - However, 
its observation consisting of only (X 31 , X 32 , X 34 , X 35 ) here, 
is uncorrelated (linearly independent) with the other down¬ 
loaded packets Xi 2 ,X 42 ,X 52 (follows from the construction 
of PM codes). And, its best strategy cannot be better than 
introducing random errors which could be caught with high 
probability (see Section IV-Cl l. After identifying the compro¬ 
mised node(s), the new node will regenerate the lost data using 
the symbols received from the other helper nodes and report 
the compromised node(s). 

B. Secure reconstruction 

To reconstruct a hie, a user contacts any fc = 4 nodes, say 
nodes 1,2,3 and 4, downloads all their symbols and computes 
their correlation hashes. Then, the user compares the computed 
hashes with their corresponding ones downloaded from the 
trusted server and constructs aA:xfc = 4x4 block-table 
formed of blocks each of size [d — b) x [d — b) = 4x4. 
Table |III] depicts part of such table corresponding to nodes 3 
and 4. 

We say that a block is mismatched if it has at least one x. In 
the reconstruction block-table, the column corresponding to a 
non-compromised node will contain at most 6=1 mismatched 
block(s). Therefore, the user will identify the compromised 
node by looking for a column with more than one mismatched 
block. The user then discards the data from the compromised 
node and decode the hie using the symbols downloaded from 
the other 3 nodes. It can do so because the underlying PM code 
has an effective k' = 3. Here too, the adversary can attempt 
to avoid being detected by choosing erroneous symbols that 
lead to a match between the computed and trusted hashes. For 
instance here, the adversary needs to hip all the x’s in two 
blocks in the column of node 3 in order to be undetected. 
Next, we will analyze the probability of this event and show 
that it can be made arbitrarily small. 

C. Error analysis 

The error analysis in |[6l can be applied here to prove that 
the probability of failing to detect a compromised node during 
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TABLE III 

Part of the reconstruction hash table formed by a user contacting 

NODES 1, 2, 3 AND 4. THE COLUMNS CORRESPONDING TO NODES 1 AND 2 ARE 
OMITTED HERE DUE TO SPACE CONSTRAINTS. 

repair or reconstruction is upper bounded by 1 /q and therefore 
can be made as small as desired by increasing q. Suppose the 
adversary wants to Hip an x to a •/ in a comparison table, 
say the x corresponding to X 12 and X 32 in Table HI] It can 
introduce an error e to X32 as to change it to ^32 + e and will 
succeed in Hipping the entry in the table if e is orthogonal to 
Xi 2 - Assuming the symbols of the original hie are iid and 
uniformly distributed over F", for a hxed error e € F^, 

there are q'"~^ possible values of -A 12 that are orthogonal to 
e. Hence, the probability of e being orthogonal to one symbol 
given the adversary’s observation is 

P{e±X,2) = ^ = -. 

9 9 

Of course, if the adversary were omniscient it would know the 
value of X 12 and can choose a value of e that is orthogonal 
to Xi 2 . Note that the adversary needs to hip many entries 
in the comparison table to fool the secure repair and/or the 
reconstruction algorithms described earlier. These events are 
not independent and we upper bound the probability of error, 
i.e., adversary being undetected, by 1 /q. 

VI. Proof of Main Result 

We show that the code construction described in Sec. HYI 
based on PM codes and correlation hashes achieves the 
resiliency capacity in Th. (Tj To secure an {n,k,d) DSS 
against an adversary controlling 6 nodes, the construction 
uses an {n,k' = k — b,d' = d — b) minimum-bandwidth 
regenerating PM code with storage per node a = {d — b)/3. 
These codes achieve the (non-secure) optimal storage/repair 
bandwidth tradeoff and can store a hie of size B = 
YliZi min {a, {d — b — i + l)/3} symbols Q, lfT9l . A simple 
change of variables shows that B = Cg- We prove Th. [Uby 
showing that these B symbols can be secured by following 
the secure repair and secure decoding rules in Sec |V| 

A. Secure repair 

If a node / fails, the new node contacts d helper nodes, 
6 . 1 , 6 . 2 ,. ■., 6 d for repair. Each of the helper nodes send Xh^f 



















































to the new node. In addition, the new node downloads all the 
hashes Xh^f ■ Xh^f from the trusted server and forms a repair 
comparison table similar to Table El WLOG, suppose that 
among the helper nodes exactly b nodes are compromisec(|. 
Also, suppose the ideal case that every transmitted packet 
corrupted by the adversary results in an x in the comparison 
table. An uncorrupted node will contain at most b x’s in its 
column. Therefore, every helper node corresponding to more 
than & x’s must be a compromised node. 

Secure repair rule: The repaired node decides that any 
helper node corresponding to a column (or a row) in the 
comparison table with more than b x’s is a compromised node. 

The adversary can try to avoid being detected by causing 
errors that lead to switching at least d — b x’s into /’s in 
columns corresponding to nodes under its control. However, 
since we assume that b < k/2 the adversary knows zero 
information about the symbols sent by the helpers node 
that are not under its control (a property of the underlying 
(n, k—b, d—b) PM code). Therefore, the error analysis detailed 
in Sec. IV-CI holds here and the probability of the adversary 
succeeding, i.e., causing undetected erroneous repair data, is 
upper bounded by 1/q. 

B. Secure reconstruction 

A user that wants to reconstruct the file contacts k nodes 
and downloads all their stored data (a = d — b symbols from 
each node). In addition, it contacts the trusted server and 
downloads all the hashes corresponding to the downloaded 
symbols. WLOG, suppose that b nodes among those which 
were contacted are compromised. Then, the user forms the 
reconstruction block-table between the trusted hashes and 
computed hashes as done in Table [HI] 

Each block contains the hashes cross-checking the symbols 
downloaded from two different nodes. A mismatched block 
indicates that one of these two nodes is sending corrupted 
packets and therefore is compromised. Hereafter, a node 
having more than b + 1 mismatched blocks has to be com¬ 
promised. Otherwise, the 6-1-1 other nodes corresponding to 
the mismatched blocks are all compromised which contradicts 
our initial assumption that there are at most b compromised 
nodes. Therefore, the user implements the following rule: 

Secure reconstruction rule: The user decides that any node 
having more than b mismatched blocks in its row or column 
is a compromised node. 

Treating the 6 compromised nodes as erasures, the user 
recovers the data using the k — b other contacted nodes. 
Moreover, it reports the identity of the compromised nodes. 

Note that if 6 > k/2, then the limited-knowledge adversary 
can decode the whole file and is actually omniscient. In this 
case, our scheme cannot be used to achieve security since the 
adversary can carefully choose the errors as to switch all the 
x’s into /’s in the comparison tables. If 6 < k/2, then on 
each uncompromised node there is at least one symbol that 

^The analysis stays the same even if less than b helper nodes happen to be 
compromised. 


the adversary does not have information about, and the error 
analysis in Sec. IV-CI holds again here. 

C. Hash storage overhead 

The total number of stored hashes is ( 2 ) symbols in F^, 
where 9 — (") is the total number of the Xij symbols. The 

ratio of the hash size to the stored data size is = O(-) 
which can be made arbitrarily small by increasing the packet 
length V. In the given example this ratio is 
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